Goto

Collaborating Authors

 secure and trustworthy


Large Language Model Bias Mitigation from the Perspective of Knowledge Editing

arXiv.org Artificial Intelligence

Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs. Pre-trained Large Language Models (LLMs) have demonstrated exceptional performance on many tasks (Devlin et al., 2018; Floridi & Chiriatti, 2020; Brown et al., 2020). However, the encoded social stereotypes and human-like biases inevitably cause undesired behaviors when deploying LLMs in practice (Zhao et al., 2019; Navigli et al., 2023; Sheng et al., 2021). Existing approaches to mitigate biases in LLMs are mainly categorized into: (1) Fine-tuning (Zmigrod et al., 2019; Webster et al., 2020; He et al., 2022; Liang et al., 2020; Lauscher et al., 2021), which includes techniques such as re-balanced corpus pre-training, contrastive learning, projection methods, and efficient parameter tuning. However, existing techniques treat social groups as interchangeable (Gallegos et al., 2023) and neutralize protected attributes of different social groups in model inputs or outputs, while ignoring or Furthermore, existing debiasing evaluation metrics mainly focus on the degree of bias, but fail to measure whether the model retains its origin knowledge (Gallegos et al., 2023) of discerning reasonable disparities among different social groups.


The Effect of Model Size on LLM Post-hoc Explainability via LIME

arXiv.org Artificial Intelligence

Large language models (LLMs) are becoming bigger to boost performance. However, little is known about how explainability is affected by this trend. This work explores LIME explanations for DeBERTaV3 models of four different sizes on natural language inference (NLI) and zero-shot classification (ZSC) tasks. We evaluate the explanations based on their faithfulness to the models' internal decision processes and their plausibility, i.e. their agreement with human explanations. The key finding is that increased model size does not correlate with plausibility despite improved model performance, suggesting a misalignment between the LIME explanations and the models' internal processes as model size increases. Our results further suggest limitations regarding faithfulness metrics in NLI contexts.


TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs' outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLM's response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with factchecking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. Large-scale language models (LLMs) have recently been in the spotlight due to their impressive performance in various NLP tasks, sparking enthusiasm for potential applications (Kaddour et al., 2023; Bubeck et al., 2023). However, a notable concern has emerged regarding the ability of LLMs to generate plausible yet incorrect responses (Tam et al., 2022; Liu et al., 2023; Devaraj et al., 2022), particularly challenging for users without specialized expertise. Consequently, users are often advised to employ LLMs in scenarios where they can confidently assess the information provided.


Attacks on Third-Party APIs of Large Language Models

arXiv.org Artificial Intelligence

Large language model (LLM) services have recently begun offering a plugin ecosystem to interact with third-party API services. This innovation enhances the capabilities of LLMs, but it also introduces risks, as these plugins developed by various third parties cannot be easily trusted. This paper proposes a new attacking framework to examine security and safety vulnerabilities within LLM platforms that incorporate third-party services. Applying our framework specifically to widely used LLMs, we identify real-world malicious attacks across various domains on third-party APIs that can imperceptibly modify LLM outputs. The paper discusses the unique challenges posed by third-party API integration and offers strategic possibilities to improve the security and safety of LLM ecosystems moving forward. Our code is released at https://github.com/vk0812/Third-Party-Attacks-on-LLMs.


How can AI be made more secure and trustworthy? - Help Net Security

#artificialintelligence

While we're still debating whether and how long it will take to reach singularity and superintelligence, artificial intelligence is playing an increasingly important role in our everyday lives. Artificial intelligence – most commonly machine learning (ML) – is the process of training algorithms using data, instead of explicitly programming them. Such algorithms are already being used in applications ranging from HR to finance and transport to medicine, and in use cases almost too numerous to mention. The benefits of machine learning are obvious: they enable faster analysis of vastly more data than any human or even groups of humans are capable of. Many ML applications that can surpass human capabilities already exist, such as those designed to play Go and Chess, or detect fraudulent insurance claims.


Global Big Data Conference

#artificialintelligence

As real-world AI deployments increase, IBM says the contributions can help ensure they're fair, secure and trustworthy. IBM on Monday announced it's donating a series of open-source toolkits designed to help build trusted AI to a Linux Foundation project, the LF AI Foundation. As real-world AI deployments increase, IBM says the contributions can help ensure they're fair, secure and trustworthy. "Donation of these projects to LFAI will further the mission of creating responsible AI-powered technologies and enable the larger community to come forward and co-create these tools under the governance of Linux Foundation," IBM said in a blog post, penned by Todd Moore, Sriram Raghavan and Aleksandra Mojsilovic. Specifically, IBM is contributing the AI Fairness 360 Toolkit, the Adversarial Robustness 360 Toolbox and the AI Explainability 360 Toolkit.